Centroid estimation based on symmetric KL di- vergence for Multinomial text classification prob- lem
نویسندگان
چکیده
We define a new centroid estimator for text classification based on the KLdivergence of the classes. The score favors documents that have a similar distribution in documents of the same class but different distributions in documents of different classes. Experiments on several standard data sets indicate that the new method outperforms better than traditional Naive Bayes classifier, especially for larger training data.
منابع مشابه
A New Feature Selection Score for Multinomial Naive Bayes Text Classification Based on KL-Divergence
We define a new feature selection score for text classification based on the KL-divergence between the distribution of words in training documents and their classes. The score favors words that have a similar distribution in documents of the same class but different distributions in documents of different classes. Experiments on two standard data sets indicate that the new method outperforms mu...
متن کاملProbabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents
In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been termed as symmetric and asymmetric models. For text data specifically both asymmetric and sy...
متن کاملA Probabilistic Hierarchical Clustering Method for Organising Collections of Text Documents
In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been termed as symmetric and asymmetric models. For text data specifically both asymmetric and sy...
متن کاملLarge margin multinomial mixture model for text categorization
In this paper, we present a novel discriminative training method for multinomial mixture models (MMM) in text categorization based on the principle of large margin. Under some approximation and relaxation conditions, large margin estimation (LME) of MMMs can be formulated as linear programming (LP) problems, which can be efficiently and reliably solved by many general optimization tools even fo...
متن کاملAn Estimation of Required Rotational Torque to Operate Horizontal Directional Drilling Using Rock Engineering Systems
Horizontal directional drilling (HDD) is widely used in soil and rock engineering. In a variety of conditions, it is necessary to estimate the torque required for performing the reaming operation. Nevertheless, there is not presently a convenient method to accomplish this task. In this paper, to overcome this difficulty based on the basic concepts of rock engineering systems (RES), a model for ...
متن کامل